Plasmodium malariae structure and genetic diversity in sub-Saharan Africa determined from microsatellite variants and linked SNPs in orthologues of antimalarial resistance genes

Plasmodium malariae, a neglected human malaria parasite, contributes up to 10% of malaria infections in sub-Saharan Africa (sSA). Though P. malariae infection is considered clinically benign, it presents mostly as coinfections with the dominant P. falciparum. Completion of its reference genome has paved the way to further understand its biology and interactions with the human host, including responses to antimalarial interventions. We characterized 75 P. malariae isolates from seven endemic countries in sSA using highly divergent microsatellites. The P. malariae infections were highly diverse and five subpopulations from three ancestries (independent of origin of isolates) were determined. Sequences of 11 orthologous antimalarial resistance genes, identified low frequency single nucleotide polymorphisms (SNPs), strong linkage disequilibrium between loci that may be due to antimalarial drug selection. At least three sub-populations were detectable from a subset of denoised SNP data from mostly the mitochondrial cytochrome b coding region. This evidence of diversity and selection calls for including P. malariae in malaria genomic surveillance towards improved tools and strategies for malaria elimination.

www.nature.com/scientificreports/ Targeted characterisation of gene loci has also been explored to describe P. malariae populations. Genetic variations in P. malariae antigenic proteins thrombospondin-related anonymous protein-TRAP, apical membrane antigen 1-AMA1, and 6-cysteine protein-P48/45 were reported in Asia (Thailand, Myanmar and Lao) 4 . These showed high mutational diversity in P. malariae trap and ama1 compared to p48/45, with mostly non-synonymous mutations that suggest genetic variation in these genes is maintained by positive diversifying selection 4 . Similarly, new non-synonymous mutations were observed in polymorphic sites of the dhfr gene, leading to the description of six new candidate DHFR alleles associated with sulfadoxine-pyrimethamine (SP) resistance 5 . This has been further shown in recent P. malariae whole genome sequence analyses that reported four synonymous and seven non-synonymous single nucleotide polymorphisms (SNPs) in some orthologs of drug resistance genes 6 . All these studies suggest high diversity in P. malariae parasite species as well as frequent recombination and gene flow between populations. To further investigate this, we explored the genetic diversity and population structure of P. malariae across seven endemic countries in sub-Saharan Africa using highly polymorphic microsatellite loci and describe the SNPs in orthologous genes associated with antimalarial resistance in P. falciparum.

P. malariae samples. Archived blood samples at the Medical Research Council Unit The Gambia (MRCG)
and those from collaborators within the Pathogen Diversity Network Africa (PDNA) were accessed following mutual consent and ethical permissions. The P. malariae infected samples analyzed were from seven sub-Saharan countries with varying malaria transmission settings and antimalarial drug use and resistance profiles ( Table 1). Samples processed were either dried blood spot or already extracted DNA, except for those from Nigeria where leucocyte depleted red blood cell (RBC) were used. DNA was extracted using QIAamp DNA mini kit (QIAGEN, Germany) before confirmatory real-time PCR diagnosis of P. malariae infection was done as previously described 7 . The resulting 75 confirmed P. malariae infections (mostly co-infected with P. falciparum by qPCR analysis) were used for further genetic diversity analysis as shown in Table 1.

Microsatellite amplification and fragment analysis.
Multi-locus genotyping of P. malariae was carried out using five highly divergent microsatellites (Pm_02, Pm_09, Pm_11, Pm_34 and Pm_47) in a seminested PCR reaction as previously described 2 . The primary amplification contained 3 µL of P. malariae DNA in a 15 µL reaction, 1 × Thermopol buffer (New England Biolabs), MgCl 2 at 1.5 mM, 0.6 U of Taq Polymerase (New England Biolabs), 0.8 mM of dNTPs and 0.05 μM each of forward and reverse primer. Same conditions applied for the semi-nested reaction using 2 μL of the primary amplification product in a 35 μL reaction with forward and reverse primer concentration at 0.1 μM. Cycling conditions for the primary amplification were-initial denaturation for 4 min at 94 °C followed by 35 cycles of denaturation for 30 s at 94 °C, annealing at 48 °C for 30 s, extension for 60 s at 68 °C and then a final extension for 2 min at 68 °C. The same cycling conditions were used for the semi-nested amplification except for annealing at 52 °C and a total of 20 cycles. Microsatellite PCR products were separated on a 3730 capillary sequencer (Applied Biosystems) following dilution (1:50-1:100) and post-amplification mixing of differently labelled and sized loci. Analysis of electropherograms was carried out using Geneious Prime Version 2022.0.1 (http:// www. genei ous. com) and alleles were scored automatically in comparison with size standard HD400 (Applied Biosystems).
Targeted amplicon sequencing of antimalarial resistance genes. Eleven P. malariae orthologues of P. falciparum antimalarial resistance genes were amplified and sequenced (Table 2). Specific primers were designed for each of the genes and optimum conditions for amplification with Q5 Polymerase (New England Biolabs) are shown in Table 3. Amplified products were verified on 1% agarose gel and all amplicons were pooled per sample for deep sequencing library preparation using the TruSeq HT library prep kit. Libraries were cleaned with the Agencourt AMPure XP PCR purification kit (Beckman Coulter, Brea, CA, USA) according to the manufacturer's protocol. Amplicon concentration and size were measured on a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and Tapestation (Agilent). Paired-end sequencing was performed on the MiSeq (Illumina, San  11 and Evenness 12,13 were calculated using the summary function "poppr" in R 14 . Pairwise Bruvo's genetic distance was calculated with the microsatellite markers for all isolates using the command "bruvo.dist" in R and visualized as a hierarchical heat map. To determine population structure, the optimal number of genetic clusters was first determined by running k-means sequentially with increasing values of k and comparing different clustering solutions using Bayesian Information Criterion (BIC) in R software. In addition, the 'find.clusters' function in R's adegenet package (version 2.0) was used to assign each individual into genetic clusters. Discriminant analysis of principal component (DAPC) was applied to describe clusters of genetically related individuals and displayed using scatter plots. DAPC transforms the data using principal component analysis (PCA), and then performs a discriminant analysis on the principal components retained using a cross-validation method. Ancestry was determined with the admixture model using STRU CTU RE Version 2.3.4 15 with individuals assigned to K populations based on their multilocus genotype. The estimated K likelihood value was set to 1-9 and STRU CTU RE was used with 100,000 MCMC Reps over 10,000 Burn-in period. The most appropriate value of K was determined using calculated ΔK implemented in Structure Harvester Web v0.6.94 16 . Ancestry proportions were then visualized as bar plots.
Targeted candidate drug resistance gene sequencing and SNP analysis. The quality of Fastq files obtained was checked using FASTQC, then trimmed and filtered to remove poor quality segments and indices. The reads were processed as shown in Fig. 1-reads were either mapped directly to the reference of concatenated target genes obtained from PLASMODB OR filtered for missingness before mapping to the reference OR passed through a denoising pipeline (dada2 version 1.16.0) before filtration and reference mapping. Filtration for missingness was done in three steps (i) samples with greater than 80% missingness were removed from the original dataset (ii) loci with greater than 70% missingness from the dataset obtained from (filtration 1) were removed, and (iii) samples with greater than 40% missingness from the dataset obtained from (filtration 2) were removed (summarized in Supplementary Table 1). Reads were mapped as merged or unmerged using both BWA and BOWTIE separately. Variant calling from each bam file was done using both BCFtools and freebayes. The resulting vcf files of variants were compared and concordant SNPs were identified for further analysis. To determine the structure of isolates from SNPs, DAPC was applied as described above and the first two coordinates visualised on a scatter plot. Pairwise population comparisons of the P. malariae orthologous drug resistance SNPs by Nei's genetic distance were visualised as a hierarchical heat map. SNP-based genetic complexity within samples (complexity or multiplicity of infection) was assessed by estimating Wright's inbreeding co-efficient (Fws). The F WS metric (values range from 0 to 1) is a measure of within-host diversity that describes the relationship between the genetic diversity of an individual infection relative to the genetic diversity of the parasite population 17 . A low F WS indicates high within-host diversity relative to the population. The extent of linkage disequilibrium (LD) between SNPs of the orthologous drug resistance targets was also determined and displayed using the LDheatmap R package.

Results
Genetic differentiation, population structure, ancestry and complexity of infection. All five microsatellite loci were highly polymorphic, with 8-12 alleles and a mean of 9.8 alleles per locus. Pm_09 was the most polymorphic microsatellite and Pm_11 was the least variable ( Table 4). The expected heterozygosity (Nei's genetic diversity) averaged across all loci was 0.72, ranging from 0.61 (Pm_11) to 0.85 (Pm_02). This measure provides an estimation of the probability that two randomly selected genotypes are different on a scale of 0 (no genotypes are different) to 1 (all genotypes are different). Mean genotypic evenness was 0.65; evenness is a measure of the distribution of genotype abundances, where populations dominated by a single genotype have values closer to zero and populations with equally abundant genotypes yield values close to one 18 . At the country population level, only two pairs of MLGs were found in the isolates from Burkina Faso and Nigeria. Genotypic diversity and richness were high as indicated by high lambda or Simpson's Index and expected heterozygosity-Nei's unbiased gene diversity ( Table 5). The least heterozygosity was observed for isolates from Cameroon (0.47), while the highest was for isolates from Ghana (0.80), with both including only 3 isolates. A total of 20 private alleles were observed across all populations with 50% of them seen in Tanzania. Genotypic evenness was high across populations (overall E.5 = 0.972). There was a wide variation in linkage disequilibrium between loci in populations as determined by the Index of Association (IA), ranging from 0 in Ghana to 1 for Cameroonian isolates.
The hierarchical heat map of genetic distances derived from microsatellite and SNP data indicated that the relationship among the P. malariae isolates was independent of their geographic origin (Fig. 2), with West, Central and East African isolates clustering together in most of the clades. Three genetic clusters were observed with the SNP data, which showed overall lower higher genetic distances due to the low frequencies of minor alleles and this was more evident with the filtered and denoised datasets (Fig. 2b,c). The distribution between the SNP and microsatellite distance differs, with wider distances between smaller number of isolates observed with the microsatellite data ( Supplementary Fig. 1). This is expected as microsatellites are multi-allelic, likely neutral, and more likely to differ between pairs of isolates. Estimates of Wrights fixation index (F ST ) of differentiation  www.nature.com/scientificreports/ between populations was biased by small sample sizes but were low overall, and indication of poor differentiation between country populations due to intrapopulation genetic variation. The SNP data however showed relatively high genetic distance between Cameroon (which had the smallest number of sample) and Burkina Faso (F ST = 0.437) ( Supplementary Fig. 2). Further, microsatellite data identified five genetic clusters, which by DAPC scatter display were not also defined by the geographic origin of the isolates. Similarly, SNP based clustering using the unfiltered dataset grouped the isolates into five clusters with each cluster having isolates from different geographic origins (Fig. 3). While the pattern of geographic-independent clustering was still observed with the filtered and denoised datasets, the clusters were less defined from a reduced number of isolates especially in the denoised data (Fig. 3c,d). Using STRU CTU RE software, the optimal number of clusters by Evanno method 19 was defined as K = 3 (Fig. 4a,b), though additional peaks were detected in the Evanno graph at K = 6 and K = 8. Considering 70% as the probability threshold to assign an individual to a particular cluster, admixture modelling grouped isolates into three ancestral clusters (Fig. 4c), none of which was unique to isolates from any geographic setting.
Mixed genotyped infections (F WS values < 0.95) were identified in the SNP data from all populations, though most isolates had high Fws from the denoised data, suggesting a single dominant genotype (Fig. 5a,b). Isolates from Nigeria and Ghana had the lowest mean Fws, thus more complex.
SNPs at orthologous drug resistance loci. Sequence alignment of the orthologous drug-resistance genes with BOWTIE identified five more mutations from the unmerged reads compared to the merged reads with both variant calling tools, whereas the merged reads identified significantly more variants with BWA sequence alignment (Supplementary Tables 2 and 3). Combining all the sequence alignment and variant calling algorithms, twenty concordant SNPs from the two approaches (though occurring in low frequencies) were retained; of which seven were retained from the denoised and filtered dataset ( Table 6). Eight of these coded for nonsynonymous variants, resulted in amino acid substitutions and interestingly, three of the four concordant SNPs reported in Pmaat1 were nonsynonymous. The nonsynonymous SNP observed in P. malariae sulfadoxine   www.nature.com/scientificreports/ resistance gene (Pmdhps) at position S452C aligned in close proximity with the S436A/F mutation in P. falciparum. This was also the case for the nonsynonymous SNP observed in the multi-drug resistance gene (Pmmdr1) at position V100L, lying in close proximity to the N86Y/F mutation in P. falciparum. The linkage disequilibrium (LD) heat map revealed significant patterns of LD between SNPs on different orthologous drug resistance genes. High r 2 values were observed between most of the SNPs on the mitochondrial Pmcytb gene and between Pmcytb SNPs and those on other targets such as Pmdhps, Pmdhfr and Pmmdr1 from the unfiltered dataset (Fig. 6a). The high LD observed in the mitochondrial SNPs were retained in both the filtered and denoised datasets (Fig. 6b,c).

Discussion and conclusion
Malaria elimination programs and tools are focused on eliminating the main malaria parasites P. falciparum and P. vivax, but other malaria parasites species such as P. malariae are co-transmitted in malaria endemic regions and warrant attention for achieving elimination. As knowledge of the diversity and effect of elimination tools on these minor species have not had much academic or public health focus, this study took advantage of the Pathogen Diversity Network Africa (PDNA), to collate a small but the widest sample set of P. malariae isolates across seven African countries and one from Asia, describing population structure, high genetic diversity, genotypic richness and evenness in the species of malaria parasite. A high level of genetic diversity is essential for the long-term survival of populations, and the extent of variation determines the ability of the species to adapt to environmental challenges imposed by nature or control interventions. The high diversity in P. malariae found here is similar to those previously reported in Kenya and Malawi 2,3 , despite the variable and small number of samples analysed from some of the countries. Malaria transmission intensity and history of interventions vary across the different countries represented in this study and could have affected the results. High transmission results in frequent heterologous recombination of the parasite in the mosquito vector, breaking down linkage disequilibrium between variable loci and increasing the genetic diversity within populations. In general, the probability of outcrossing in parasite populations vary from a high to low malaria transmission gradient in the West, Central and Eastly direction in sub-Saharan Africa 20 . This www.nature.com/scientificreports/ notwithstanding, heterozygosity was high and genetic distance was low between isolates despite the differences in malaria transmission intensity, except for the three isolates from Cameroon. The low genetic distance observed in this study require further validation since there are no studies on P. malariae for direct comparison. However, the results obtained are consistent with a recent P. falciparum study in Nigeria 21 howbeit in sharp contrast with an older study in Senegal 22 , suggesting relatively high levels of panmixia in the current sampled populations despite the varying transmission patterns. Indeed, multi-locus genotypes were rare across all populations, an indication of recombination and absence of the clonal expansion observed in some P. falciparum populations with low or seasonal malaria transmission 23 . Recurrent gene flow promotion between parasite populations across countries via human or vector migration, could have led to a lack of differentiation according to geographic origin 3 . This was evident in low microsatellite differentiation indices between countries, although the small number of isolates per country population limits the accuracy of the indices inferred. It is also possible that gene flow alone does not explain the high genetic diversity or lack of geographic differentiation observed in P. malariae, other factors such as a lack of bottleneck event or intervention to reduce local diversity could also be considered.
Population structure analysis with neutral microsatellite loci (i.e SSRs) identified five clusters, each with isolates from different countries. This is further indication of high intrapopulation variability in these markers and absence of population specific selection of these loci that may drive population differentiation. This structure is not consistent with isolation by distance seen for P. falciparum, where genetic clusters can be allocated to geographic populations in the west, central and eastern African regions. As P. malariae mostly occurs as coinfections with P. falciparum, the drivers of such independent substructure may therefore be different, or it is possible that this could have been established by an earlier event preceding any population drift due to demography or isolation. Using the admixture model in STRU CTU RE, an optimum of three ancestral clusters were determined and this also showed all isolates with components of each ancestry irrespective of the country of origin. STRU CTU RE implements a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. However, its robustness was shown to be impacted by small uneven sample sizes between subpopulations and/ or hierarchical levels of population structure 24,25 .
The SNP data from selected P. malariae orthologues of P. falciparum drug resistance genes also clustered isolates into 5 subpopulations, although only 3 less distinct clusters were retained after stringent filtration processes, and membership of the subgroups did not overlap fully with those determined by microsatellites. The distribution between the SNP and microsatellite distance differed, with wider distances between smaller number of isolates using SSR data. This is expected as SSRs are multi-allelic, likely neutral, and more likely to differ between pairs of isolates. Thus, further investigation into the possible drivers of population differentiation for this parasite species will improve understanding of its complexity, particularly with regard to control and elimination strategies. While it appeared that most infections had mixed genomes (polygenomic) as indicated by Fws, this was affected by the denoising and filtration pipelines used for analysis. Thus, further investigation using appropriate sample size will be necessary to clarify if co-transmission of different clones and increased possibility of recombination exists for this Plasmodium species. High level of infection complexity is important to note, given that it is one of the indices for monitoring the effect of interventions. Unlike for P. falciparum, the complexity did not seem to be higher in relatively higher malaria transmission settings and may be part of the unique biology of this species that needs further investigation. As drugs and other interventions drive down populations, selection and changes in complexity should be monitored for the species.
Drugs have been a major selective force on P. falciparum, with resistance associated with mutations in several genes and signatures of positive selection across the genomes. We identified 20 P. malariae mutations in the orthologous drug resistant genes by combining different sequence alignment and variant calling algorithms. These putative variants have not been described in previous targeted or genome scans, probably because of differences in isolates used or the methods applied. Here we retained only high-quality variants supported by combinations of two mapping and two SNP calling algorithms. Most of the candidate variants were synonymous but there were several nonsynonymous SNPs across seven genes, especially in Pmcytb, whose P. falciparum orthologue drive resistance against atovaquone. Atovaquone is a member of the quinolines, to which resistance in P. falciparum have been associated with mutations in multidrug resistance gene (Pfmdr1), chloroquine resistance transporter (Pfcrt) and an amino acid transporter (Pfaat1). Most of the LD observed with the unfiltered dataset were not reproducible with the denoised and filtered dataset, with the exception of the LD observed in the SNPs of the mitochondrial gene, either due to common ancestry or selection of dominant haplotype by drugs or other factors. Additional candidate variants were seen in Pmdhfr and Pmdhps, orthologues of antifolate resistance in P. falciparum. The antifolate antimalarials, sulfadoxine-pyrimethamine are still in wide use for chemoprevention against malaria in pregnancy and combination with amodiaquine for seasonal malaria chemoprevention in West Africa. These together could be selecting for the identified variants. While the nonsynonymous SNPs reported here occurred in low frequencies, further verification, characterization and association of these SNPs will require increased genomic surveillance and phenotype association studies from in vivo and ex vivo therapeutic efficacy tests.
A limitation of this study, which warrants cautious interpretation of the results, is the small number of samples analyzed across the different countries and the lack of bio and clinical data of the samples. Larger population studies for P. malariae with appropriate epidemiology or clinical data are required to validate the findings of smaller studies as reported here. Another limitation is the use of standard bioinformatics pipelines designed for P. falciparum. While these may be acceptable for preliminary analysis, custom pipelines that take possible amplification and sequencing errors into consideration may be better for P. malariae, particularly because of the scarcity of population data with high quality confirmed variants from this parasite species. The different sample types analyzed could also be a limitation to this study, the dried blood spot samples were more amenable to producing poor quality results, possibly due to the low prevalence and low parasite density of the non-falciparum species. www.nature.com/scientificreports/ Therefore, venous blood sampling and more robust molecular techniques that take these into consideration will be beneficial in future molecular surveillance of P. malariae. The current drive for malaria elimination needs innovative strategies to target all malaria parasites. One approach can be by integrating genomic surveillance of all Plasmodium species into malaria control and elimination programs in sub-Saharan Africa, learning from the experience with COVID-19, to refine approaches as new variants are identified and monitored. This study has established the relevance of this in P. malariae.